Labeled Grammar Induction with Minimal Supervision
نویسندگان
چکیده
Nearly all work in unsupervised grammar induction aims to induce unlabeled dependency trees from gold part-of-speechtagged text. These clean linguistic classes provide a very important, though unrealistic, inductive bias. Conversely, induced clusters are very noisy. We show here, for the first time, that very limited human supervision (three frequent words per cluster) may be required to induce labeled dependencies from automatically induced word clusters.
منابع مشابه
Probing the Linguistic Strengths and Limitations of Unsupervised Grammar Induction
Work in grammar induction should help shed light on the amount of syntactic structure that is discoverable from raw word or tag sequences. But since most current grammar induction algorithms produce unlabeled dependencies, it is difficult to analyze what types of constructions these algorithms can or cannot capture, and, therefore, to identify where additional supervision may be necessary. This...
متن کاملLearning Data Driven Representations from Large Collections of Multidimensional Patterns with Minimal Supervision
Learning Data Driven Representations from Large Collections of Multidimensional Patterns with Minimal Supervision by Parvez Ahammad Doctor of Philosophy in Engineering—Electrical Engineering and Computer Sciences University of California, Berkeley Professor S. Shankar Sastry, Chair Traditionally, taking experimental measurements of a physical or biological phenomenon was an expensive, laborious...
متن کاملیک مدل بیزی برای استخراج باناظر گرامر زبان طبیعی
In this paper, we show that the problem of grammar induction could be modeled as a combination of several model selection problems. We use the infinite generalization of a Bayesian model of cognition to solve each model selection problem in our grammar induction model. This Bayesian model is capable of solving model selection problems, consistent with human cognition. We also show that using th...
متن کاملEvaluating Induced CCG Parsers on Grounded Semantic Parsing
We compare the effectiveness of four different syntactic CCG parsers for a semantic slotfilling task to explore how much syntactic supervision is required for downstream semantic analysis. This extrinsic, task-based evaluation also provides a unique window into the semantics captured (or missed) by unsupervised grammar induction systems.
متن کاملSupervised Grammar Induction using Training Data with Limited Constituent Information
Corpus-based grammar induction generally relies on hand-parsed training data to learn the structure of the language. Unfortunately, the cost of building large annotated corpora is prohibitively expensive. This work aims to improve the induction strategy when there are few labels in the training data. We show that the most informative linguistic constituents are the higher nodes in the parse tre...
متن کامل